On the desirability of models for inferring genome phylogenies.
نویسنده
چکیده
Genomes are clearly suited for inferring common ancestry and for understanding ancestor–descendent relationships and interspecies gene transfer. Genomic evolutionary models can tell us a great deal about the processes that drive genome evolution, the mutational and selective pressures that lead to the genesis of biochemical pathways and operons, and the nature and extent of lateral gene transfer (LGT). Simultaneously, a robust phylogeny can be constructed that depicts the evolutionary relationships of the organisms in which the genomes are found. Several approaches have been employed to infer species phylogenies at the genome level. In general terms, these can be divided into ad hoc summary statistics based on genome content, the use of concatenated alignments and the use of consensus methods (i.e. phylogenetic supertrees [1]). The basic premise of methods based on summary statistics is that genomes are compared and a gene content matrix is compiled. Then, either a distance is estimated between all pairs of taxa and entered into a distance matrix that is summarized using a clustering algorithm, or a dendrogram is inferred using maximum parsimony. This is usually referred to as the species phylogeny. The principal difference between this approach and the approaches that use concatenated alignments or supertrees is that information concerning homolog interrelationships is not used. Presence or absence of homologs is the only information that is scored, and this approach can be considered ad hoc in the sense that the methods are applied uniformly to all datasets and, therefore, the assumptions are not informed by the data themselves. Unsurprisingly, the results of using summary statistics have been variable. Although many methods have recovered groups that seem sensible and have support from external biochemical or morphological data, there have been cases in which the inferred trees are unusual [2]. For example, the haloarchaea are a group of halophilic Archaea, long taken to be members of the Euryarchaeota. Wolf et al. [2] and Korbel et al. [3] placed this taxon at the base of the Archaea. In the figures of Henz et al. [4], this taxon was placed among the Bacteria in one instance, within the Euryarchaeota in a second example and as the deepest-branching Archaeon in a third example. Dutilh et al. [5] point out that the correct placement of the haloarchaea is within the Euryarchaeota and that previous methods placed this taxon erroneously as a deep-branching Archaeon. This erroneous placement is likely to be due to the large number of bacterial genes present in the haloarchaea [6]. The haloarchaea are, therefore, pulled to a position that is intermediate between the two groups from which the haloarchaea genes came. The data violate the ad hoc assumptions of the methods. Problems of this nature argue for the development of explicit genome evolutionary models. Evolutionary models are statements concerning how it is thought that evolution has occurred [7]. If a model were correct, the inferred distances between two genomes would be accurate and would provide consistent estimates of the topology of the resulting phylogeny. The most desirable properties of these models are explicitness when describing the evolutionary process, realism or plausibility of the assumptions contained in the models and clarity in the interpretation of the output [8]. Usually, models are derived in amaximum likelihood framework in which the model consists of the phylogenetic tree of the genomes and the process underlying their evolution [9]. However, even when alternative models are not tested or lengthy computational optimization is not performed, an explicit model of evolution can still be assumed in calculations [10]. A realistic model of genome evolution must, as a minimum, deal with gene duplication and loss, in addition to acquisition of genes by LGT. This is not to say that all parameters are necessary for all analyses. When models differ in their numbers of free parameters and are nested, a likelihood ratio test can be used to choose the most appropriate parameter. Gu and Zhang [11] describe a model called the extended genome content distance. This model uses the number of homologs (0, 1 or O1) to derive the genome distance. The model does not take account of horizontal gene transfer and, as a result, the authors report a position for the haloarchaea that is the same as the much simpler method of Korbel et al. [3]. A model has also been developed that deals with LGT, albeit in a slightly different setting [12]. Nonetheless, the development of explicit model-based approaches is to be welcomed as a useful step towards the understanding of genome evolution. When the genomic age began, it was assumed that the huge increase in the amount of available data would result in more-accurate phylogenies. Instead, the extent of apparent genome plasticity has fueled a passionate debate Corresponding author: McInerney, J.O. ([email protected]). Available online 3 November 2005 Update TRENDS in Microbiology Vol.14 No.1 January 2006
منابع مشابه
Inferring models of multiscale copy number evolution for single-tumor phylogenetics
MOTIVATION Phylogenetic algorithms have begun to see widespread use in cancer research to reconstruct processes of evolution in tumor progression. Developing reliable phylogenies for tumor data requires quantitative models of cancer evolution that include the unusual genetic mechanisms by which tumors evolve, such as chromosome abnormalities, and allow for heterogeneity between tumor types and ...
متن کاملExaML version 3: a tool for phylogenomic analyses on supercomputers
MOTIVATION Phylogenies are increasingly used in all fields of medical and biological research. Because of the next generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. We present ExaML version 3, a dedicated production-level code for inferring phylogenies on whole-transcriptome and whole-genome alignments using supercomputers. RES...
متن کاملDesirability-based architectural design of forms
Abstract The decisions and personal preferences of the designer are vital for all aspects and stages of the design. To elaborate, the designer has the central role in creation, development, detailing and construction of the built forms. Also, the scientific/engineering evaluations of the design models are carried out under the directions and decisions of the designer. The paper explores the con...
متن کاملBreakpoint Phylogenies.
We describe a number of heuristicsfor inferring the gene orders of the hypothetical ancestral genomes in a fixed phylogeny. The optimization criterion is the minimum number of breakpoints (pairs of genes adjacent in one genome but not the other) in the gene orders of two genomes connected by an edge of the tree, summed over all edges. The key to the method is an exact solution for trees with th...
متن کاملRobust Optimal Desirability Approach for Multiple Responses Optimization with Multiple Productions Scenarios
An optimal desirability function method is proposed to optimize multiple responses in multiple production scenarios, simultaneously. In dynamic environments, changes in production requirements in each condition create different production scenarios. Therefore, in multiple production scenarios like producing in several production lines with different technologies in a factory, various fitted r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Trends in microbiology
دوره 14 1 شماره
صفحات -
تاریخ انتشار 2006